Make copy/graft/prune work with unevenly distributed rows by lutter · Pull Request #5807 · graphprotocol/graph-node

lutter · 2025-02-08T02:02:04Z

When we copy/graft/prune, we split the entire work that needs to be done into batches that are meant to take roughly three minutes to avoid bloating the subgraph_deployment table. Pruning causes a very serious problem with that, and when that happens it can be crippling for the performance of the overall system.

The code that adjusts the size of the batch to hit that target tacitly assumes that the actual work is distributed linearly, i.e., if we ask for work covering 10,000 rows (going by vid), we are fine with getting fewer rows, maybe even just a handful, but this needs to be uniform: any 10,000 row batch needs to have roughly the same number of rows. Pruning breaks this assumption since in a pruned subgraph, the beginning of the subgraph (as determined by block numbers) will be much sparser than the later parts. In one case, this misled the estimation logic to eventually try and copy 160M rows since that's what the early part of the subgraph indicated could be copied in the three minutes, as the subgraph was pruned and the range of 160M row numbers only contained 128 rows in the beginning of the subgraph. After that, the subgraph was dense and copying 160M vid's would take many hours.

This PR removes the assumption that the relation between vid and actual rows is linear. It uses the histogram_bounds from pg_stats to build a piecewise linear function, and estimates the number of rows in a given vid range using that piecewise linear function (the Ogive in the code) Now, when we ask for a batch of 10,000 rows, the code will adapt to an uneven vid distribution and return different size vid ranges for different parts of the table.

mangas · 2025-02-10T15:49:05Z

graph/src/util/ogive.rs

+///
+/// The word 'ogive' is somewhat obscure, but has a lot fewer letters than
+/// 'piecewise linear function'. Copolit also claims that it is also a lot
+/// more fun to say.


lutter self-assigned this Feb 8, 2025

mangas self-requested a review February 8, 2025 10:13

mangas reviewed Feb 10, 2025

View reviewed changes

lutter mentioned this pull request Feb 11, 2025

store: Try to avoid pathological batch size adjustments #5792

Closed

lutter force-pushed the lutter/nonlinear-batch branch 2 times, most recently from ff607b0 to b19b6c1 Compare February 11, 2025 01:53

mangas approved these changes Feb 11, 2025

View reviewed changes

lutter added 9 commits February 11, 2025 12:45

store: Do not assume that copies start at vid == 0

01e6dd5

store: Start copies at the minimum vid, not just at 0

229d95e

graph: Add utility for handling cumulative histograms

a0860d4

store: Move AdaptiveBatchSize to its own module

f2d5e44

store: Move batching logic for copies into seperate struct

8749f20

store: Introduce a VidRange struct

d63782c

store: Use VidRange for pruning

c4844ce

store: Use VidBatcher to batch pruning queries

5f648ff

store: Remove unused ToSql/FromSql impls for AdaptiveBatchSize

157c291

lutter force-pushed the lutter/nonlinear-batch branch from b19b6c1 to 157c291 Compare February 11, 2025 20:45

lutter merged commit 157c291 into master Feb 11, 2025
6 checks passed

lutter deleted the lutter/nonlinear-batch branch February 11, 2025 20:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Make copy/graft/prune work with unevenly distributed rows#5807

Make copy/graft/prune work with unevenly distributed rows#5807
lutter merged 9 commits intomasterfrom
lutter/nonlinear-batch

lutter commented Feb 8, 2025

Uh oh!

mangas Feb 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

lutter commented Feb 8, 2025

Uh oh!

mangas Feb 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants